Use this template to complete your project throughout the course. Your Final Project presentation in class will be based on the contents of this document. Replace the title/name and text below with your own, but keep the headers.

Overview

The goal of my project is to explore the potential relationship between alcohol consumption and geographic/economic causes, such as median household income. I use the dataset from GHDx named United States Alcohol Use Prevalence by County 2002-2012 to get information of the alcohol pattern across the U.S. (by year and sex). Additionally, I use a map shapefile from the ZevRoss to create map plots, and data from the American Community Survey (ACS) to get the information of median household income.

My Github Repository: Slin94

Faculty/Staff: Dr. Bomyi Lim, Sherrie Xie (Graduate student), Dr. Blanca Himes Dr. Bomyi Lim: helped shape my idea. Sherrie Xie (Ph.D. student): helped with maps and the correlation test Dr. Blanca Himes: helped with maps and notified some limitations of my study.

Introduction

Alcohol consumption is the 7th leading risk factor for both death and disability all over the world. It led to 2.8 million deaths in 2016. Although several previous research suggests that moderate level of alcohol consumption brings benefit to humans, this finding has resently been questioned. As a matter of fact, a new study published in 2018 (Griswold et al. 2018) states that the level of alcohol consumption which minimize harm on human health is zero. Additionally, negative effect of alcohol on health outcomes raise with the increase in the level of consumption. Therefore, while it is impossible to fully stop populational alcohol use, analyzing factors which can potentially affect alcohol consumption, such as geographic or economic factors, helps develop preventive strategies.

Methods

+I use the dataset from GHDx named United States Alcohol Use Prevalence by County 2002-2012 to get information of the alcohol pattern across the U.S. (by year and sex). Additionally, I use a map shapefile from the US Census Bureau to create map plots, and data from the American Community Survey (ACS) to get the information of median income/unemployment/poverty rate.

In the first paragraph, describe the data used and general methodological approach. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.

## 
## The downloaded binary packages are in
##  /var/folders/6c/wrs0172s3dzg403t4945383w0000gn/T//RtmpWOdNs9/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/6c/wrs0172s3dzg403t4945383w0000gn/T//RtmpWOdNs9/downloaded_packages
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble  2.1.3     ✔ purrr   0.3.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
## Loading required package: sp
## Checking rgeos availability: TRUE
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths

Import and clean up alcohol consumption data preliminarily

## Classes 'tbl_df', 'tbl' and 'data.frame':    3179 obs. of  41 variables:
##  $ State                               : chr  "National" "Alabama" "Alabama" "Alabama" ...
##  $ Location                            : chr  "United States" "Alabama" "Autauga County" "Baldwin County" ...
##  $ 2002 Both Sexes                     : num  55.4 40.7 39.4 54 36 31.1 31.9 34.9 34.5 38 ...
##  $ 2002 Females                        : num  47.5 32.1 29.4 45.7 27.3 21.7 22.1 26.5 25.8 27.9 ...
##  $ 2002 Males                          : num  63.7 49.6 49.7 62.5 45 40.7 42.1 43.6 43.4 48.4 ...
##  $ 2003 Both Sexes                     : num  56.6 42.3 40.6 54.9 37.9 33.3 34.1 37 35.5 39.5 ...
##  $ 2003 Females                        : num  48.9 33.8 31 47 29.2 24 24 28.3 26.9 29.3 ...
##  $ 2003 Males                          : num  64.6 51.2 50.6 63.2 46.9 42.9 44.5 46.1 44.5 50.1 ...
##  $ 2004 Both Sexes                     : num  55.2 41 39.2 53.1 35.8 32.3 33.1 34.5 33.8 38.4 ...
##  $ 2004 Females                        : num  47.6 32.9 30.1 45.2 27.4 23.5 23.5 26.2 25.6 28.7 ...
##  $ 2004 Males                          : num  63.2 49.5 48.6 61.3 44.5 41.5 43.1 43.2 42.3 48.3 ...
##  $ 2005 Both Sexes                     : num  56 42.3 40.7 54.4 37.4 33.9 34.5 34.2 35.2 39.6 ...
##  $ 2005 Females                        : num  48.7 33.8 31.6 46.5 28.7 24.7 24.3 25.5 26.6 29.8 ...
##  $ 2005 Males                          : num  63.5 51.1 50.1 62.7 46.4 43.4 45 43.2 44 49.8 ...
##  $ 2006 Both Sexes                     : num  55.4 40.7 38.8 53.1 34.4 32 32.5 33.9 34 37.8 ...
##  $ 2006 Females                        : num  48.5 33.9 31.7 46.9 27.3 24.6 23.8 26.5 27.2 29.8 ...
##  $ 2006 Males                          : num  62.7 47.9 46.1 59.6 41.7 39.7 41.6 41.6 41.1 46.1 ...
##  $ 2007 Both Sexes                     : num  56 41.6 40.6 54.2 35.5 33.1 33.3 34.2 34.8 38.8 ...
##  $ 2007 Females                        : num  48.9 34.1 33.1 46.9 27.6 25.1 23.8 26 27.3 30.3 ...
##  $ 2007 Males                          : num  63.4 49.4 48.3 61.7 43.7 41.5 43.2 42.6 42.6 47.6 ...
##  $ 2008 Both Sexes                     : num  56.1 42.6 41.2 55.1 37 34.2 34.1 36.5 35.6 38.9 ...
##  $ 2008 Females                        : num  49 35.5 33.7 48.2 29.1 26.7 25.3 28.6 27.8 31.2 ...
##  $ 2008 Males                          : num  63.4 50 48.9 62.3 45.3 42.1 43.2 44.7 43.7 46.9 ...
##  $ 2009 Both Sexes                     : num  56 41.3 40.8 53.6 35.2 33.3 32.3 36.2 33.2 37.3 ...
##  $ 2009 Females                        : num  48.9 33.9 32.7 46 27.2 25.4 23.5 27.7 24.5 29.3 ...
##  $ 2009 Males                          : num  63.3 49 49.2 61.5 43.5 41.5 41.5 45 42.1 45.5 ...
##  $ 2010 Both Sexes                     : num  56.1 42.5 42.5 54.6 38.6 34.4 33.5 37.6 35.2 37.3 ...
##  $ 2010 Females                        : num  49.3 35.3 34.4 47.3 30.2 26.5 24.7 28 26.4 29.7 ...
##  $ 2010 Males                          : num  63.2 50.1 50.9 62.1 47.4 42.6 42.6 47.7 44.4 45.2 ...
##  $ 2011 Both Sexes                     : num  57.1 44.8 43.6 57 39.4 36.7 36 38.9 38.2 39.6 ...
##  $ 2011 Females                        : num  50.6 37.3 35.4 50 31 28.5 27 29.1 28.8 31.6 ...
##  $ 2011 Males                          : num  63.9 52.5 52 64.3 48.2 45.2 45.4 49.1 47.9 47.8 ...
##  $ 2012 Both Sexes                     : num  56 43.6 42.5 55.7 37.6 35.8 34.9 37.1 35.8 38.2 ...
##  $ 2012 Females                        : num  49.1 36.5 34.4 48.8 29.2 28 26.3 27.3 26.6 30.1 ...
##  $ 2012 Males                          : num  63 51 50.9 62.8 46.3 43.9 43.9 47.3 45.3 46.6 ...
##  $ Percent Change 2002-2012, Both Sexes: num  0.9 7.2 7.9 3.2 4.6 15.3 9.4 6.4 3.8 0.5 ...
##  $ Percent Change 2002-2012, Females   : num  3.5 13.5 17.1 6.8 7.2 29.1 19.3 3.3 3 7.8 ...
##  $ Percent Change 2002-2012, Males     : num  -1 2.9 2.3 0.4 3 7.7 4.1 8.3 4.3 -3.8 ...
##  $ Percent Change 2005-2012, Both Sexes: num  0 3.1 4.5 2.2 0.7 5.7 1.4 8.6 1.7 -3.6 ...
##  $ Percent Change 2005-2012, Females   : num  0.9 7.8 8.9 5 1.9 13.6 8.4 7.2 -0.2 1.1 ...
##  $ Percent Change 2005-2012, Males     : num  -0.8 -0.1 1.6 0.1 -0.1 1.1 -2.5 9.5 2.9 -6.5 ...
## Classes 'tbl_df', 'tbl' and 'data.frame':    3179 obs. of  29 variables:
##  $ State                               : chr  "National" "Alabama" "Alabama" "Alabama" ...
##  $ Location                            : chr  "United States" "Alabama" "Autauga County" "Baldwin County" ...
##  $ 2005 Both Sexes                     : num  7 6 6.2 9 5.4 5.1 4.8 5.3 5.4 4.9 ...
##  $ 2005 Females                        : num  5.2 3.6 3.4 6.7 2.5 2.3 2.1 1.7 2.3 2.7 ...
##  $ 2005 Males                          : num  8.9 8.4 9 11.3 8.3 8.1 7.6 9.1 8.7 7.2 ...
##  $ 2006 Both Sexes                     : num  7.2 5.6 5.6 8.4 5 4.8 4.4 5.2 5.2 4.5 ...
##  $ 2006 Females                        : num  5.5 3.4 3 6.1 2 2 1.9 1.6 2 2.5 ...
##  $ 2006 Males                          : num  8.9 7.9 8.3 10.8 8.1 7.6 7.1 8.9 8.4 6.5 ...
##  $ 2007 Both Sexes                     : num  7.7 5.6 5.6 8.4 5.1 4.7 4.4 5.2 5.2 4.5 ...
##  $ 2007 Females                        : num  5.9 3.5 3.3 6.1 1.9 2.1 1.9 1.5 1.9 2.6 ...
##  $ 2007 Males                          : num  9.5 7.8 8 10.8 8.3 7.5 7 9 8.5 6.4 ...
##  $ 2008 Both Sexes                     : num  7.8 6.2 6 9.1 5.6 5.2 4.8 5.8 5.7 4.9 ...
##  $ 2008 Females                        : num  6.1 3.8 3.5 6.5 2.1 2.3 1.9 1.7 2.1 2.8 ...
##  $ 2008 Males                          : num  9.6 8.6 8.5 11.8 9.3 8.2 7.7 10.1 9.4 7.1 ...
##  $ 2009 Both Sexes                     : num  7.7 6 5.7 8.6 5.5 5 4.6 6.1 5.3 4.8 ...
##  $ 2009 Females                        : num  6.1 3.7 3.4 6 1.8 2.1 1.8 1.8 1.7 2.8 ...
##  $ 2009 Males                          : num  9.3 8.3 8.1 11.3 9.3 7.9 7.4 10.5 9.1 6.8 ...
##  $ 2010 Both Sexes                     : num  7.8 6 5.8 8.7 6.1 5 4.6 6.5 5.5 4.7 ...
##  $ 2010 Females                        : num  6.2 3.9 3.6 6.4 2.3 2.1 1.7 2.1 1.9 2.7 ...
##  $ 2010 Males                          : num  9.4 8.3 8.1 11.1 10 8 7.5 11.1 9.3 6.8 ...
##  $ 2011 Both Sexes                     : num  8.8 7.3 6.9 10.4 7.4 6.1 5.5 7.9 6.9 5.9 ...
##  $ 2011 Females                        : num  7 4.6 4 7.8 2.5 2.6 2.2 2.4 2.5 3.3 ...
##  $ 2011 Males                          : num  10.6 10.1 10 13.1 12.4 9.7 9 13.7 11.5 8.5 ...
##  $ 2012 Both Sexes                     : num  8.2 6.6 6.3 9.3 6.4 5.5 5 7 6 5.3 ...
##  $ 2012 Females                        : num  6.7 4.3 3.8 7.3 2.3 2.5 2 2.3 2.3 3.1 ...
##  $ 2012 Males                          : num  9.9 8.9 8.8 11.4 10.5 8.6 8.1 11.9 9.8 7.6 ...
##  $ Percent Change 2005-2012, Both Sexes: num  17.2 9.8 1.7 3.4 18.3 7.2 4.6 31.8 10.5 8.3 ...
##  $ Percent Change 2005-2012, Females   : num  28.2 19.1 12.2 7.7 -7.6 8.7 -4.3 38.5 1.4 14.6 ...
##  $ Percent Change 2005-2012, Males     : num  10.5 5.6 -2.4 0.8 26.4 6.7 7.2 30.6 13 5.9 ...
## Classes 'tbl_df', 'tbl' and 'data.frame':    3179 obs. of  41 variables:
##  $ State                               : chr  "National" "Alabama" "Alabama" "Alabama" ...
##  $ Location                            : chr  "United States" "Alabama" "Autauga County" "Baldwin County" ...
##  $ 2002 Both Sexes                     : num  17.3 13.2 13.6 17.9 12.8 11.3 10.2 12.2 11.9 12.2 ...
##  $ 2002 Females                        : num  10.8 6.3 6.6 9.7 5.4 4.4 4.1 4.4 4.9 5.6 ...
##  $ 2002 Males                          : num  24.1 20.4 20.8 26.4 20.4 18.5 16.5 20.2 19 19.1 ...
##  $ 2003 Both Sexes                     : num  17.8 13.1 13.3 17.4 12.5 11.3 10.4 12.5 11.6 12.1 ...
##  $ 2003 Females                        : num  11.2 6.6 6.9 9.8 5.6 4.8 4.5 4.9 5.1 5.9 ...
##  $ 2003 Males                          : num  24.7 19.8 20 25.3 19.6 18 16.5 20.3 18.4 18.6 ...
##  $ 2004 Both Sexes                     : num  17.1 13.2 13 17.4 12.1 11.4 10.6 12.2 11.7 12.3 ...
##  $ 2004 Females                        : num  10.7 6.7 6.5 9.5 5.2 4.9 4.6 4.5 4.9 6 ...
##  $ 2004 Males                          : num  23.6 20 19.8 25.7 19.4 18.1 16.9 20.2 18.7 18.8 ...
##  $ 2005 Both Sexes                     : num  16.8 12.8 12.8 16.8 11.9 11 10.5 11.2 10.9 11.8 ...
##  $ 2005 Females                        : num  10.6 6.8 6.8 9.6 5.4 5.1 4.8 4.2 4.8 6.1 ...
##  $ 2005 Males                          : num  23.4 19 18.9 24.3 18.6 17.2 16.4 18.5 17.3 17.8 ...
##  $ 2006 Both Sexes                     : num  16.8 12.5 12.1 16.6 10.7 10.5 10 11.5 10.7 11.4 ...
##  $ 2006 Females                        : num  10.8 7.3 7 10.3 5 5.3 4.9 4.8 5.1 6.4 ...
##  $ 2006 Males                          : num  22.9 17.9 17.4 23.1 16.7 15.9 15.4 18.4 16.5 16.6 ...
##  $ 2007 Both Sexes                     : num  17.5 12.9 12.8 16.9 11.1 10.9 10.3 11.8 11 11.7 ...
##  $ 2007 Females                        : num  11.4 7.5 7.5 10.4 5 5.5 5 4.9 5.2 6.6 ...
##  $ 2007 Males                          : num  23.8 18.5 18.2 23.7 17.3 16.4 15.9 19 17.1 16.9 ...
##  $ 2008 Both Sexes                     : num  17.4 13.7 13.1 17.7 12 11.6 10.8 12.8 11.9 12.3 ...
##  $ 2008 Females                        : num  11.4 7.7 7.3 10.5 5.2 5.7 4.8 5.2 5.3 6.6 ...
##  $ 2008 Males                          : num  23.6 19.9 19.2 25.2 19.1 17.8 16.9 20.7 18.8 18.3 ...
##  $ 2009 Both Sexes                     : num  17.6 13.4 13 17 11.5 11.7 10.5 13.2 11.2 12 ...
##  $ 2009 Females                        : num  11.5 7.6 7.4 10 5 5.6 4.6 5.4 4.8 6.4 ...
##  $ 2009 Males                          : num  23.8 19.4 18.8 24.2 18.3 17.9 16.6 21.2 17.9 17.8 ...
##  $ 2010 Both Sexes                     : num  17.8 13.4 13.3 16.7 12.7 11.4 10.2 14 11.8 11.6 ...
##  $ 2010 Females                        : num  11.8 7.8 7.9 9.9 5.9 5.5 4.4 5.9 5.4 6.3 ...
##  $ 2010 Males                          : num  24 19.1 18.8 23.7 19.8 17.5 16.2 22.4 18.3 17.2 ...
##  $ 2011 Both Sexes                     : num  19 14.7 14.4 18.6 13.5 12.4 11.3 15.4 13.3 13 ...
##  $ 2011 Females                        : num  12.8 8.9 8.8 11.7 6.3 6.3 5.2 6.8 6.4 7.3 ...
##  $ 2011 Males                          : num  25.6 20.7 20.1 25.8 21 18.8 17.6 24.4 20.4 18.8 ...
##  $ 2012 Both Sexes                     : num  18.3 13.4 13.2 16.9 12.4 11.4 10.3 14.1 12 11.8 ...
##  $ 2012 Females                        : num  12.4 7.9 7.9 10.4 5.4 5.7 4.6 5.9 5.5 6.4 ...
##  $ 2012 Males                          : num  24.5 19.1 18.7 23.7 19.6 17.4 16.2 22.7 18.7 17.5 ...
##  $ Percent Change 2002-2012, Both Sexes: num  5.8 1.2 -2.7 -5.7 -3.1 0.9 0.9 16.2 1.2 -3.2 ...
##  $ Percent Change 2002-2012, Females   : num  14.8 24.9 20.4 6.4 0.8 29.3 11.1 33.2 11.7 14.2 ...
##  $ Percent Change 2002-2012, Males     : num  1.6 -6.4 -10.3 -10.3 -4.1 -6.1 -1.8 12.4 -1.6 -8.5 ...
##  $ Percent Change 2005-2012, Both Sexes: num  8.9 4.6 3.4 0.4 4.1 3.3 -2 25.9 10 0.2 ...
##  $ Percent Change 2005-2012, Females   : num  17.5 15.2 16 7.7 0.4 11.5 -4.6 39.4 15.8 5.2 ...
##  $ Percent Change 2005-2012, Males     : num  4.9 0.7 -1.3 -2.7 5.2 0.8 -1.2 22.7 8.3 -1.6 ...

First, I compare national change of alcohol use (by sexes)

## Warning: Ignoring unknown parameters: stat

## Warning: Ignoring unknown parameters: stat

Given that the data of heavy alcohol use is not available from 2002-2004, I use data from 2005-2012 in later analyzation. The bar plots show that the change in alcohol prevalence differs by sex and time (2005 vs. 2012). Therefore, I make a box plot to analyze this difference in 2005 and 2012.

## No id variables; using all as measure variables
## 
## Call:
## glm(formula = Sexes ~ Prevalence, family = binomial(), data = box_heavy_state_change)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.59777  -0.80146  -0.01314   0.79425   1.90621  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.72360    0.60452  -4.505 6.63e-06 ***
## Prevalence   0.10526    0.02172   4.847 1.25e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.40  on 101  degrees of freedom
## Residual deviance: 100.65  on 100  degrees of freedom
## AIC: 104.65
## 
## Number of Fisher Scoring iterations: 5
## Waiting for profiling to be done...
##                  2.5 %     97.5 %
## (Intercept) -4.0260207 -1.6365349
## Prevalence   0.0666349  0.1523813
## Waiting for profiling to be done...
##                     OR     2.5 %    97.5 %
## (Intercept) 0.06563821 0.0178452 0.1946534
## Prevalence  1.11099902 1.0689051 1.1646042

## No id variables; using all as measure variables
## 
## Call:
## glm(formula = Sexes ~ Prevalence, family = binomial(), data = box_binge_state_change)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8983  -0.6573  -0.0909   0.6486   3.2942  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.49684    0.54335  -4.595 4.32e-06 ***
## Prevalence   0.19895    0.03906   5.093 3.53e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 141.402  on 101  degrees of freedom
## Residual deviance:  93.338  on 100  degrees of freedom
## AIC: 97.338
## 
## Number of Fisher Scoring iterations: 5
## Waiting for profiling to be done...
##                  2.5 %     97.5 %
## (Intercept) -3.6703990 -1.5204899
## Prevalence   0.1296417  0.2841977
## Waiting for profiling to be done...
##                     OR      2.5 %    97.5 %
## (Intercept) 0.08234443 0.02546631 0.2186048
## Prevalence  1.22011564 1.13842038 1.3286956

Results

To analyze whether geographic location (different states) has impact on alcohol consumption level, I make 2X3 (heavy/binge vs. both/females/males) maps to visualize geographic impact

## Reading layer `acs_2012_2016_county_us_B27001' from data source `/Users/slin/BMIN503_Final_Project/acs_2012_2016_county_us_B27001/acs_2012_2016_county_us_B27001.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3141 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -2031905 ymin: -2427680 xmax: 2516374 ymax: 732103.3
## epsg (SRID):    NA
## proj4string:    +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
#See the changes at state level
#import state code, clean up
State_code <- read_excel("State Code.xlsx")
Code.update <- State_code$Code
Code.new <- str_pad(Code.update, 2, pad = "0")
State_code <- State_code%>%
  mutate(Code = as.character(Code.new))

#join state code to alcohol_heavy_state_change
alcohol_heavy_state_change <- inner_join(alcohol_heavy_state_change, State_code, by = "State")

#join state code to alcohol_binge_state_change
alcohol_binge_state_change <- inner_join(alcohol_binge_state_change, State_code, by = "State")

#aggregate map information by state, join to cleaned heavy/binge data
US_states <- US_sf%>%
  dplyr::select(geometry)%>%
  aggregate(by = list(US_sf$Code), FUN = mean)%>%
  rename(Code = "Group.1")%>%
  mutate(Code = as.character(Code))
US_states_heavy <- inner_join(US_states, alcohol_heavy_state_change, by = "Code")
US_states_binge <- inner_join(US_states, alcohol_binge_state_change, by = "Code")

#plot
#set min&max for heavy
prev_h_both_min <- sort(US_states_heavy$Both)[1]
prev_h_both_max <- sort(US_states_heavy$Both, decreasing = TRUE)[1]
prev_h_male_min <- sort(US_states_heavy$Males)[1]
prev_h_male_max <- sort(US_states_heavy$Males, decreasing = TRUE)[1]
prev_h_female_min <- sort(US_states_heavy$Females)[1]
prev_h_female_max <- sort(US_states_heavy$Females, decreasing = TRUE)[1]

#set min&max for binge
prev_b_both_min <- sort(US_states_binge$Both)[1]
prev_b_both_max <- sort(US_states_binge$Both, decreasing = TRUE)[1]
prev_b_male_min <- sort(US_states_binge$Males)[1]
prev_b_male_max <- sort(US_states_binge$Males, decreasing = TRUE)[1]
prev_b_female_min <- sort(US_states_binge$Females)[1]
prev_b_female_max <- sort(US_states_binge$Females, decreasing = TRUE)[1]

#set theme
my_theme <- function() {
  theme_minimal() +                                  
  theme(axis.line = element_blank(),                 
        axis.text = element_blank(),                 
        axis.title = element_blank(),
        panel.grid = element_line(color = "white"),  
        legend.key.size = unit(0.6, "cm"),          
        legend.text = element_text(size = 12),       
        legend.title = element_text(size = 12),
        plot.title = element_text(size = 16))      
}

#set color for both, males, and females
myPalette_both <- colorRampPalette(brewer.pal(9, "Greens"))
myPalette_male <- colorRampPalette(brewer.pal(9, "Blues"))
myPalette_female <- colorRampPalette(brewer.pal(9, "Reds"))

#plot:heavy&binge, both
ggplot() + 
  geom_sf(data = US_states_heavy, aes(fill = Both), lwd = 0) +
  my_theme() +
  ggtitle("Change in Age-standardized Prevalence of Heavy Drinking\n 2005 to 2012, Both Sexes") +
  scale_fill_gradientn(name = "Prevalence\nChange (%)", colours = myPalette_both(100),
                       limit = range(prev_h_both_min, prev_h_both_max)) 

Also, given that there is a large gap between 2002/2005 vs. 2012 alcohol consomption in heavy & binge, I check how they change over the ten years

#get national alcohol use values (2002-2012)
changes_national <- combine_alcohol%>%
  filter(State == "National")
changes_national <- changes_national[3:35]

#clean up
any_changes <- as.data.frame(matrix(as.numeric(changes_national[1,]), nrow = 3, ncol = 11))
heavy_changes <- as.data.frame(matrix(as.numeric(changes_national[2,]), nrow = 3, ncol = 11))
binge_changes <- as.data.frame(matrix(as.numeric(changes_national[3,]), nrow = 3, ncol = 11))
combine_changes_national <- rbind(any_changes, heavy_changes, binge_changes)
combine_changes_national$Degree <-c(rep("Any", 3), rep("Heavy", 3), rep("Binge",3))
combine_changes_national <- combine_changes_national%>%
  rename("2002" = "V1", "2003" = "V2", "2004" = "V3", "2005" = "V4", "2006" = "V5", "2007" = "V6", "2008" = "V7", "2009" = "V8", "2010" = "V9", "2011" = "V10", "2012" = "V11")

#Plot change in heavy alcohol use from 2002 to 2012
#clean up
combine_changes_national <- melt(combine_changes_national, id.vars = "Degree")%>%
  rename("Year" = "variable")
combine_changes_national$Sexes <- rep(c("Both", "Females", "Males"), 33)
combine_changes_national_heavy <- combine_changes_national%>%
  filter(Degree == "Heavy")
  #can use drop_na() to drop N/A data from 2002-2005

#plot
ggplot(combine_changes_national_heavy, aes(x = Year, y = value, colour = Sexes)) + 
  geom_line(aes(x = Year, y = value, group = Sexes)) +
  scale_color_manual("Sexes", values = c("Both" = "green3", "Females" = "coral1", "Males" = "cornflowerblue")) +
  ylab("Change in heavy alcohol use from 2002 to 2012 (%)")
## Warning: Removed 9 rows containing missing values (geom_path).

Census article–>median household income

## Your original .Renviron will be backed up and stored in your R HOME directory if needed.
## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
## To use now, restart R or run `readRenviron("~/.Renviron")`
## [1] "d38c26d19748e1f9eaaa8bde2d2fe0c6b6b0d606"
## Getting data from the 2006-2010 5-year ACS
## Getting data from the 2007-2011 5-year ACS
## Warning: Column `State` joining character vector and factor, coercing into character vector
## Warning: Column `State` joining character vector and factor, coercing into character vector

Correlation

## 
##  Spearman's rank correlation rho
## 
## data:  corr.h.2010 and corr.i.2010
## S = 13213, p-value = 0.009049
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.365515
## 
##  Spearman's rank correlation rho
## 
## data:  corr.b.2010 and corr.i.2010
## S = 12906, p-value = 0.006451
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.3802523

## 
##  Spearman's rank correlation rho
## 
## data:  corr.h.2011 and corr.i.2011
## S = 13917, p-value = 0.0186
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.3317397
## 
##  Spearman's rank correlation rho
## 
## data:  corr.b.2011 and corr.i.2011
## S = 11855, p-value = 0.001792
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4307471

Describe your results and include relevant tables, plots, and code/comments used to obtain them. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.